Difficulty setting

Charles T. Gray plays on normal tidyverse difficulty setting.

Gaming difficulty:

  • the pew-pew sound my gun makes startles me (this workshop)
  • normal
  • hardcore

What about our other presenters?

Here to learn together

We are here to learn from you, too.

The plan is to live code in these notes, enabling us to edit these notes live using the wonderful xaringan::infinite_moon_reader function, incorporating feedback.

Notes for this workshop

We will all work from a personal copy of these slides.

After some introductory waffling, and set-up, we’ll download the file together.

You will be able to annotate these notes or write extra code examples for yourself.

Indeed, there is little code provided in these notes. But there are lots of functions. We will write the code together.

Pseudo code

So that you have the experience of getting your hands dirty with code, we’ll use pseudo code.

<this is pseudo code>

install.packages("<you write code here>")

If I wanted to install a package called metafor, I would use the code

install.packages("metafor")

Notice that we drop the <>. That’s not code, it’s pseudo code to highlight where your input is.

Survival tips

Don’t be afraid to harness all tools available:

  • Modern day coding practice comprises almost entirely searching “how do I do <x> in <language>”.
  • The community is your best resource; talk to each other and make friends and future collaborators.
  • Cheatsheets (Help-Cheatsheets)

Finally, everyone needs to ragequit sometimes.

via GIPHY

Stickynote signalling

yellow

Yup, all good.

pink (the ragequit option)

Some assistance would be greatly appreciated.

Also, I think I broke the internet.

What is R?

What do R-users use R for?

R users are data detectives.

We’re going to try to cover everything except Model.

This image is from R for Data Science, a great text to get started with. It’s available online as a free ebook.

For us to learn how to be data detectives, we’ll need some data.

A fitting dataset

A fitting dataset

To familiarise ourselves with R, we will do what R users do.

We will explore a dataset.

A fitting dataset.

A few images to explain why a dataset about witch trials might be appropriate for a workshop hosted by an advocacy group for underrepresented genders.

Image search for “witch” + “michelle wolf”

Image search for “witch” + “julia gillard”

Image search for “witch” + “hillary clinton”

A fitting dataset

todo discuss witchtrials dataset here - but not too much information

Perhaps Steph would like to write something here - perhaps just adapt a bit from her analysis.

Questions, questions that need answering

What would you like to know?

Group discussion

What would you like to know?

Write questions on whiteboard

How to answer these questions?

For this workshop:

Why these tools?

  • R was intentionally developed to be a data analysis language (aeroplane)
  • RStudio is designed to help users use R (airport)

“The plane is pretty boring without the airport around it.”

(Tip of the hat to Julia Lowndes for the aeroplane analogy.)

Installation hell

Installation overview

  • install R (quick)
  • install RStudio (a little while)

Installation instructions adapted with appreciation from a previous workshop.

Install R on Mac or Windows

Go to the Comprehensive R Archive Network(CRAN) website.

It was first in a google search for ‘cran’ in June 2018.


For Mac users

  • Click on Download R for (Mac) OS X
  • Look at the top link under Latest release, which at time of writing is R-3.5.0.pkg, and download this if compatible with your current version mac OS (Mavericks 10.9 or higher). Otherwise download the version beneath it which is compatible for older mac OS versions.

For Windows users

  • Click on Download R for Windows
  • Then click on the link install R for the first time
  • Download from the large link at the top of the page which at time of writing is Download R 3.5.0 for Windows.

Install R

  • Then double click the downloaded R-3.5.0.pkg file and follow the prompts to install the downloaded software.

Download & Install RStudio:

Go to the RStudio website.

It was first in a google search for ‘rstudio’ in June 2018.

Choose RStudio and scroll down to the blue Download RStudio Desktop button.

Click the green button to download RStudio Desktop Open Source License and select appropriate installer for your operating system.

Double click the installer and follow the prompts to set up RStudio.

Why RStudio?

Working in an RStudio project has many benefits.

  • Free, open source software R in an IDE
  • Reproducible workflows

Getting started in RStudio

Create a project

  • Open RStudio and create a project via File-New Project
  • Select New Directory and choose New Project
  • Name your project rcurious
  • Save the project directory wherever suits you

Argh, so many windows

R-Ladies presenters gesticulate wildly at RStudio

Let’s start with a couple useful panes.

  • Console
  • Environment

Cheat sheet

Help-Cheatsheets-RStudio IDE Cheatsheet

How to run code?!

Running code in the Console

The console is where you can execute single-line R commands.

The console is located, by default, in the lower left pane.

Try 3 + 2 and press enter:

# Type 3 + 2 here. Here's hoping infinite moon reader works. 

Environment

I can store the number 5 in an object x.

To assign a value we use an arrow <-.

x <- 5

What happens when you type x into the Console after assigning the value 5 to it?

What do you see in the Environment pane?


If I assign the value 5 to the object x, and call it in the console, it returns the value assigned.

# Assign 5 to x and call x.

There are many complex data types in R.

Data structures in R

Data objects can be made of

  • numbers
  • characters “this is a character string”
  • logicals (TRUE/FALSE)
  • list (mixed)

Or tables of combinations of the above.

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa

And much more.


The witch trials dataset is a table.

It is a tidy data structure:

  • rows are observations
  • columns are variables (attributes)

Editor

We’ll do this analysis in R markdown.

  • run code in chunks
  • write in text, LaTeX, html, outside of code chunks
  • make websites, presentations, books

Rmarkdown

Opening an .Rmd

Open File-New File-R Markdown

Follow the prompts to install the required packages.

Give your document a title and press ok.

This will open an Untitled2 template.

Save your document - note, you have given your document a title, not saved it!

Press the wheel next to the knit icon at the top of the Editor pane and select Viewer.

Now we will knit our document.

The tutorial document

A treasure trove!

Get these notes

Delete everything in your .Rmd file.

Look up softloud charles gray github

You want to navigate to: https://github.com/softloud

Click on rcurious in my Pinned repositories.

Click on workshop and then workshop.Rmd and then click the Raw button and copy and paste the text into your .Rmd file.

  • control + a to select all
  • control + c to copy
  • control + v to paste

Knit the notes

control + shift + k to knit

But it is annoying having a pop out window, so let’s view these notes in the Viewer pane.

We are going to modify the YAML at the top of the document so that you can see how to toggle a presentation into notes.

Press the little document outline button in the top right corner of the Editor pane, then you can jump to the top easier.

Change the output settings at the top of this document to:

output :
  # ioslides_presentation 
  html_document: 
     toc: true
     toc_float: true

Knit again. (This takes a while.)

Slides and notes

From now on, we’ll refer to both the notes and the slides.

Your notes are made from the same code as the slides we’re looking at.

If you want information from a few slides ago, you can find it in your notes.

Your turn

Navigate in the Viewer pane using the floating table of contents to the image of Hillary Clinton in the section called A fitting dataset.

Navigate back to this section in your Editor pane by using the Document Outline provided by the button in the top right of the Editor.

Annotating these notes

Now you can take notes directly into the script file for these slides.

Let’s create a section at the top of the document called Useful tips.

We’ll create a heading directly above the section called Hi. (Use your Document Outline to navigate there.)

It looks likes this at the top of the script file.

## Hi! 
  • To create a heading use #. The more ###, the smaller the heading.

We can add bullet points with -

- Euclid
- Beanie

It renders in your output like this:

  • Euclid
  • Beanie

Cheatsheet

There are two cheatsheets on RMarkdown in Help-Cheatsheets.

Your turn

Ask around for a “what I wish I’d known” story about R or investigate a cheatsheet.

Find a tip that sounds useful to you and add it to your notes in your Useful tips section as a bullet point.

Knit your document and bask in the pretty!

Knitting taking too long?

Try deleting the images. You can always download a fresh copy of the notes if you want to get them back.

Packages

Packages

Packages are collections of other people’s code.

Often someone has already written a script that does what you want to do.

For example, we want to import the witch trials data. We will use a package that helps with data wrangling tasks like this, the tidyverse.

We’re going to use the metapackage tidyverse to help us with our data analysis.

Functions

The most common element of packages are functions. R also comes preloaded with a base of functions commonly used.

Functions run other people’s code for us, so that we don’t have to reinvent the wheel.

We will use functions to intall and load the tidyverse.

How to spot a function

Functions in R take the form <function()>

To learn more about a function, type ?function into the Console, and the Help pane will display documentation.

Installing packages

We want to install the package tidyverse.

Take a look at the help documentation for the function install.packages().

For installation; i.e., first time only.

install.packages("<name of package>")

For using.

library(<name of package>)

Pseudo code

A friendly reminder that <> indicates pseudo code. You can navigate back to that section in your notes in your Viewer.

Your turn

We would like to install and use a package called “tidyverse”.

Let’s try.

Let’s load the tidyverse

Press the little document outline button in the top right corner of the Editor pane.

Navigate to this part of your notes.

# This is a code chunk. 

# We can write informative comments with a hash # at the start.

# Load the tidyverse using the library() function.
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.8.0     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Importing data

Import the witch trials data

Since the data is stored on an online repository, we can import it via url.

We can import this data using the read_csv() function from the tidyverse.

This function takes one argument, the url, which goes between the () as a “character string”.

There are many types file types, which often need special care.

Presenters discuss different file types and old battle stories of importing data.

Your turn

The data is found here: “https://raw.githubusercontent.com/JakeRuss/witch-trials/master/data/trials.csv

Try importing the data at the console with read_csv. What output do you see?

read_csv with the argument url produces a data object. An object we can assign.

Open a code chunk here and read the data in using read_csv and assign <- the data to an object called witchdat

  • control+alt+i to open a code chunk.
  • press green play to run chunk.

Take a look

What do you see in your Environment?

Click on it!

Exploratory data analysis

Exploratory data analysis

Let’s explore the information in this table.

  • summary
  • wrangling and cleaning
  • visualisation

Summary functions

Lots of objects in R <an R object> are friendly to the summary(<an R object>) function.

Your turn

What’s is the output of summary for witchdat?


An alternative to summary

An alternative is the skim() function from from the skimr package.

Your turn

  • install the skimr package
  • Open a code chunk here and load the skimr package in your notes
  • apply the skim function to the witchdat data

What is the difference between the output of summary and skim? Which do you like better and why?

Group discussion

Based on this new information, what questions do we add or update?

At this point, we often wish to manipulate the data in some way.

This is variously known as wrangling, cleaning, and scrubbing.

Data wrangling

The fine art of wrangling

Change the name of a variable

This workshop is based on Steph de Silva’s wonderful analysis Witch Hunting in Europe: a discovery of missingness.

One of the first things that Steph does is change the name of the column gadm.adm0 to something more human-interpretable, country.

Steph notes, “This is a gross oversimplification of geography in the middle ages of Europe, but it describes the location of the trial in terms that will be most familiar to many modern users.”

Follow in the footsteps of greatness

Let’s use the rename() function to change the name of the variable (column) gadm.adm0 to country.

To do this, we’ll learn a very useful operator, the pipe %>%.

The pipe

Visualisation

The structure of a ggplot

The tidy way of plotting data is with the ggplot2 package, which comes with the tidyverse.

Workflow

Working in projects

  • Put each project in its own directory, which is named after the project.
  • Put text documents associated with the project in the doc directory.
  • Put raw data and metadata in the data directory, and files generated during cleanup and analysis in a results directory.
  • Put source for the project’s scripts and programs in the src directory, and programs brought in from elsewhere or compiled locally in the bin directory.
  • Name all files to reflect their content or function.

A recommendation on how to organize your R project from Good Enough Practices for Scientific Computing as summarized by Software Carpentry

Further reading

Further reading

The rstats community is the best

I thought you might be interested in a conversation I had while preparing this workshop. The rstats community is your best resource.

Sidenote, I’d recently seen Maelle Salmon speak about the benefits of blogging to engage with the community.

This was the first time I tweeted about a blogpost; it is very reassuring to have people to bounce ideas off.

Thank you for getting your R-curious on with us!

We hope you had fun.

Special thanks

  • All the R-Ladies who contributed to this workshop
    • Di
    • Kim
    • Steph
    • Sarah K.
    • Sarah R.
  • Miles for git support
  • test specimen Dr X

Draft notes

Installing packages

In R, the fundamental unit of shareable code is an R package.